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A new approach to describing correlation properties of complex dynamic systems 
with long-range memory based on a concept of additive Markov chains (Phys. Rev. E 
68, 061107 (2003)) is developed. An equation connecting the memory and correlation 
function of the system under study is presented. This equation allows reconstruct- 
ing a memory function using a correlation function of the system. Effectiveness 
and robustness of the proposed method is demonstrated by simple model exam- 
ples. Memory functions of concrete coarse-grained literary texts are found and their 
universal power-law behavior at long distances is revealed. 
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The problem of long-range correlated dynamic systems (LRCS) has been under stud'"' 



a lon_g^ t_ime in many areas of contemporary physics 
etc 



QQ, biology 



economics 



or 



^ - . - - . ... - - - - - _ 4, 5|, 

• mM- important example of complex LRCS are naturally written texts jll, |8|, |9|. 
The efficient method for investigating long-range correlations in such systems consists in 
the decomposition of the space of states into a finite number of parts labelled by definite 
symbols, which are naturally ordered according to the dynamics of the system. The most 
frequently used method of the decomposition is based on the introduction of two parts of the 
phase space. In other words, the approach assumes mapping two kinds of states onto two 
symbols, say and 1. Thus, the problem is reduced to investigating the statistical properties 
of binary sequences. 

It might be thought that the coarse graining could result in losing, at least, the short- 
range memory in the sequence, sequences 
with a many-valued alphabet. They demonstrated that the mapping of a given sequence 
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into a small-alphabet sequence does not necessarily imply that the long-range correlations 
presented in the initial text would be preserved. Moreover, in general, the coarse-graining 
procedure could lead to spurious long-range correlations. However, as was shown in Ref . , 
coarse-graining does not destroy the existing correlations in many real symbolic systems. 
The statistical properties of coarse-grained texts depend, but not significantly, on the kind 
of mapping. This implies that only a small part of all possible kinds of mapping can slightly 
change the initial correlations in the system. So, there is no point in coding every symbol 
(associating every part of the phase space of the system with its binary code) to analyze the 
correlation properties of the texts, as it is done, for example, in Ref. Q, but it is sufficient 
to use the coarse-graining procedure. 

One of the ways to get a correct insight into the nature of correlations in a system con- 
sists in an ability of constructing a mathematical object (for example, a correlated sequence 
of symbols) possessing the same statistical properties as the initial system. There exist 
many algorithms for generating long-range correlated sequences: the inverse Fourier trans- 
formation 0], the expansion- modification Li method the Voss procedure of consequent 
random additions the correlated Levy walks [31, etc. 0, Q. We believe that, among 
the above-mentioned methods, using the many-step Markov chains is one of the most impor- 
tant, because it offers a possibility to construct a random sequence with definite correlation 
properties in the most natural way. This was demonstrated in Ref. where the concept 
of additive Markov chain with the step-like memory function (which allows the analytical 
treatment) was introduced. There exist some dynamic systems (coarse-grained sequences of 
Eukarya's DNA and dictionaries) with the correlation properties that can be well described 
by this model. The concept of additivity, primarily introduced in paper was later gener- 
alized for the case of binary non- stationary Markov chains ^|. Another generalization was 
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based on consideration of Markov sequences with a many-valued alphabet 

In the present work, we continue investigating into additive Markov chains with more 
complex memory functions. An equation connecting mutually-complementary characteris- 
tics of a random sequence, i.e. the memory and correlation functions, is obtained. Upon 
finding the memory function of the original random sequence on the basis of the analysis 
of its statistical properties, namely, its correlation function, we can build the correspond- 
ing Markov chain, which possesses the same statistical properties as the initial sequence. 
Effectiveness and robustness of the proposed method is demonstrated by simple model ex- 
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amples. This method is most essential for some apphcations, e.g., for the construction of 
correlated sequence of elements which can be used to fabricate the effective filters of electri- 
cal or optical signals, Ref . jl^ . The suggested method allowed us to find memory functions 
of concrete coarse-grained literary texts and to reveal their universal power-law behavior at 
long distances. 

Let us consider a homogeneous binary sequence of symbols, Oj = {0, 1}. To deter- 
mine the N-step Markov chain we have to introduce the conditional probability P{ai \ 
aj_Ar, aj_Ar+i, . . . , aj_i) of occurring the definite symbol Oj (for example, a, = 1) after A^- 
word T^^i, where T^^i stands for the sequence of symbols ai_iv, ai-N+i, • • • , cti-i- Thus, it is 
necessary to define 2^ values of the P-function corresponding to each possible configuration 
of the symbols in the A^-word aj_7v, aj_Ar+i, . . . , aj_i. The value of N is referred to as the 
memory length of Markov chain. 

Considering that we are going to deal with the sequences possessing the memory length 
of order of 10^, we need to make some simplification of the P-function. We suppose that it 
has the additive form, 

N 

P{ai = 1 I TN,i) = J2 fi(^i~k, k), (1) 

k=l 

and corresponds to the additive influence of the previous symbols upon the generated one. 
The value of /(aj_fc, k) is the contribution of symbol Oi^k to the conditional probability of 
occurring the symbol unity at the ith site. The homogeneity of the Markov chain is provided 
by the i-independence of conditional probability Eq. (^. 
Let us rewrite Eq. in an equivalent form, 

N 

P(a, = 1 I Tjv,,) = & + E Fir){a,^r - b), (2) 

r=l 

with 

N 

E /(0,r) 

b = ^^ , P(r) = /(l,r)-/(0,r). (3) 

1 - E F{r) 

r=l 

The constant b is the value of averaged over the whole sequence, b = a: 

M 



lim 

M^oo 2M + . 



T E (4) 



Indeed, according to the ergodicity of the Markov chain, a coincides with the value of 
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averaged over the ensemble of realizations of the Markov chain. So, we can write 

a = Pr{a, = 1) = E P(a, = 1 | TN,^)PriTN,^). (5) 

TN,i 

Here Pr{ai = 1) is the probability of occurring the symbol equal to unity and Pr{TN^i) is 
the probability of occurring the definite word T/v,i in the considering ensemble of sequences. 
Substituting P(aj = 1 | T^^i) from Eq. into Eq. © and taking into account the obvious 
relation Pr{Tj^^i) = 1, one gets, 

TN,i 

N N 

a = b-bJ2 Fir) + J2 Fir) J2 Pr(T^,,)a,_,. (6) 

The sum J2 PriTN,i)ai-T does not depend on the subscript r and obviously coincides with 
a. So, we have a = 6 + (a — 6) Fir). From this equation we conclude that b = a. Thus, 

r 

we can rewrite Eq. as 

TV 

P(a, = 1 I T^,0 = a + 5] F(r) (ai_, - a) . (7) 

r=l 

We refer to F(r) as the memory function (MP). It describes the strength of influence 
of previous symbol aj_r upon a generated one, a^. To the best of our knowledge, the 
concept of memory function for many-step Markov chains was introduced in Ref. The 
function P(. | .) contains the complete information about correlation properties of the 
Markov chain. Typically, the correlation function and other moments are employed as the 
input characteristics for the description of the correlated random sequences. However, the 
correlation function describes not only the direct interconnection of the elements and ai+r, 
but also takes into account their indirect interaction via all other intermediate elements. 
Our approach operates with the "origin" characteristics of the system, specifically, with 
the memory function. The correlation and memory functions are mutual-complementary 
characteristics of a random sequence in the following sense. The numerical analysis of 
a given random sequence enables one to directly determine the correlation function rather 
than the memory function. On the other hand, it is possible to construct a random sequence 
using the memory function, but not the correlation one. Therefore, we believe that the 
investigation of memory function of the correlated systems will permit one to disclose their 
intrinsic properties which provide the correlations between the elements. 
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A dichotomic symbols in a Markov chain can be thought of as the sequence of states 
of some particle, which participates in a correlated Brownian motion. Every element of 
the sequence corresponds to the instant change of particle's coordinate. Every L-word (the 
sub-sequence of symbols of the length L in the sequence) can be regarded as one of the 
realizations of the ensemble of correlated Brownian trajectories in the "temporal" interval 
L. This point of view on the symbolic sequence makes it possible to use the statistical 
methods for investigating the dynamic systems. 

We consider the distribution WL{k) of the words of definite length L by the number k of 

L 

unities in them, ki{L) = J2 ca+i, and the variance D{L) of ki{L), 

1=1 

D{L) =< {k- < k >y >, (8) 

L 

where the definition of average value of g{k) is < g{k) >= g{k)WL{k). It follows from 

k=0 

Eq. dZj) that the positive MF values result in the persistent diffusion where previous displace- 
ments of the Brownian particle in some direction provoke its consequent displacement in the 
same direction. The negative values of the MF correspond to the anti-persistent diffusion 
where the changes in the direction of motion are more probable. In terms of the Ising model 
with long-range particles interactions that could be naturally associated with the Markov 
chains, the positive (negative) values of the MF correspond to the ferromagnetic (anti- 
ferromagnetic) interaction of particles. The additive form of the conditional probability 
function corresponds to the pair interaction and disregard of many-particles interactions. 

The memory function used in Refs. 0, Q] was characterized by the step-like behavior 
and defined by two parameters only: the memory depth and the strength of symbol's cor- 
relations. Such a memory function describes only one type of correlations in a given system, 
the persistent or anti-persistent one, which results in the super- or sub-linear dependence 
D{L) Obviously, both types of correlations can be observed at different scales in the 
same system. Thus, one needs to use more complex memory functions for detailed descrip- 
tion of the systems with both type of correlations. Besides, we have to find out a relation 
connecting the mutually-complementary characteristics of random sequence, the memory 
and correlation functions. 

We suggest below two methods for finding the memory function F{r) of a random binary 
sequence with a known correlation function. The first one is based on the minimization of a 
" distance" Dist between the Markov chain generated by means of a sought- for MF and the 
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initial sequence of symbols. This distance is determined by the formula, 

1 M 

Dist = {a, - P{ai = 1 I T^,i))2 = jini^ + I ^ ^""^ ~ ^^""^ " ^ ' 

with the conditional probability P defined by Eq. ((7j). 

Let us express distance © in terms of the correlation function, 

K{r) =W[-^r-a^, ir(0) = a(l-a), K{-r) = K{r). (10) 

From Eqs. ((Tj), (jHl), one obtains 



Dist = ^ (ctj-r — o,){0'i-r' — a)F{r)F{r') — 2 ^ (aj — a)(aj_r — a)F(r) + (a^ — a)^ 

= ^ fs:(r - r')F(r)F(r') - 2 ^ K{r)F{r) + ir(0). (11) 



The minimization equation, 

5Dist 



2j2K{r- r')F{r') - 2K{r) = 0, (12) 



SF{r) 

yields the relationship between the correlation and memory functions, 

N 

K{r) = F{r')K{r - r'), r > 1. (13) 

r'=l 

Equation ()13p can also be derived by straightforward calculation of the average ajOj+r in 
Eq. ()10|1 using definition ((Zj) of the memory function. 

The second method resulting from the first one, establishes a relationship between the 
memory function F[r) and the variance D{L), 

N 

M{r, 0) = ^ F{r')M{r, r'), (14) 

r'=l 

M(r, r') = D{r - r') - {D{-r') + r[D{-r' + 1) - D{-r')\). 

It is a set of linear equations for F{r) with coefficients M(r, r') determined by D{r). The 
relations, K{r) = [D{r - 1) - 2D(r) + D{r + l)]/2 obtained in Ref. Q and D(-r) = D{r) 
are used here. 

Let us verify the robustness of our method by numerical simulations. We consider a 
model "triangle" memory function, 

r, 1 < r < 10, 

F{r) = 0.008 <{ 20 - r, 10 < r < 20, (15) 
0, r > 20, 
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FIG. 1: The initial memory function Eq. (|15|) (solid line) and the reconstructed one (dots) vs the 
distance r. In inset, the correlation function i^(r) obtained by a numerical analysis of the sequence 
constructed by means of the memory function Eq. (|15)) . 
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FIG. 2: The model correlation function K{r) described by Eq. H16() (solid line). The dots correspond 
to the reconstructed correlation function. In inset, the memory function F{r) obtained by numerical 
solution of Eq. ()13() with correlation function Eq. ()16() . 

presented in Fig. ^ by solid line. Using Eq. ((Zj), we construct a random non-biased, a = 1/2, 
sequence of symbols {0, 1}. Then, with the aid of the constructed binary sequence of the 
length 10®, we calculate numerically the correlation function K{r). The result of these 
calculations is presented in inset Fig. ^ One can see that the correlation function K{r) 
mimics roughly the memory function F{r) over the region 1 < r < 20. In the region r > 20, 
the memory function is equal to zero but the correlation function does not vanish |2^ . Then, 
using the obtained correlation function K{r), we solve numerically Eq. (fT!^ . The result is 
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shown in Fig. [T] by dots. One can see a good agrement of initial, Eq. (H^j) . and reconstructed 
memory functions F{r). 

The main and very nontrivial resuh of our paper consists in the abihty to construct a 
binary sequence with an arbitrary prescribed correlation function by means of Eq. (fT!?|) . As 
an example, let us consider the model correlation function, 

sinfr) 



K{r) = 0.1- 



(16) 



presented by the solid line in Fig. |21 We solve Eq. (jl3p numerically to find the memory 
function F{r) using this correlation function. The result is presented in inset Fig. |21 Then 
we construct the binary Markov chain using the obtained memory function F{r). To check 
up a robustness of the method, we calculate the correlation function K{r) of the constructed 
chain (the dots in Fig. |21) and compare it with Eq. ()lfi|) . One can see an excellent agreement 
between the initial and reconstructed correlation functions. 
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FIG. 3: The normalized variance D„(I/) for the coarse-grained text of Bible (solid line) and for 
the sequence generated by means of the reconstructed memory function F{r) (dots). The dotted 
straight line describes the non-biased non-correlated Brownian diffusion, Dq{L) = L/4. The inset 
demonstrates the anti-persistent dependence of ratio Dn{L) / Dq{L) on L at short distances. 



Let us demonstrate the effectiveness of our concept of the additive Markov chains when 
investigating the correlation properties of coarse grained literary texts. First, we use the 
coarse-graining procedure and map the letters of the text of Bible j21| onto the symbols zero 
and unity (here, (a — m) \^ Q,{n — z) ^— 1). Then we examine the correlation properties 
of the constructed sequence and calculate numerically the variance D[L). The result of 
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simulation of the normalized variance Dn{L) = D{L)/4a{l — a) is presented by the solid line 
in Fig. El The dominator 4a(l — a) in the equation for the normalized variance -D„(L) is 
inserted in order to take into account the inequality of the numbers of zeros and unities in the 
coarse-grained literary texts. The straight dotted line in this figure describes the variance 
Dq{L) = L/4, which corresponds to the non-biased non- correlated Brownian diffusion. The 
deviation of the solid line from the dotted one demonstrates the existence of correlations in 
the text. It is clearly seen that the diffusion is anti-persistent at small distances, L ^ 300, 
(see inset Fig. E)) whereas it is persistent at long distances. 

The memory function F{r) for the coarse-grained text of Bible at r < 300 obtained by 
numerical solution of Eq. ()14|1 is shown in Fig. |31 At long distances, r > 300, the memory 
function can be nicely approximated by the power function F{r) = 0.25r^^'^, which is 
presented by the dash-dotted line in inset Fig. HJ 
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FIG. 4: The memory function F{r) for the coarse-grained text of Bible at short distances. In 
inset, the power-law decreasing portions of the F{r) plots for several texts. The dots correspond 
to "Pygmalion" by B. Shaw. The solid line corresponds to power-law fitting of this function. The 
dash dotted and dashed lines correspond to Bible in English and Russian, respectively. 

Note that the region r ^ 40 of negative anti-persistent memory function provides much 
longer distances L ~ 300 of anti-persistent behavior of the variance D{L). 

Our study reveals the existence of two characteristic regions with different behavior of the 
memory function and, correspondingly, of persistent and anti-persistent portions in the D{L) 
dependence. This appears to be a prominent feature of all texts written in any language. 
The positive persistent portions of the memory functions are given in inset Fig. ^ for the 
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coarse-grained English- and Russian-worded texts of Bible (dash-dotted and dashed lines, 



Refs. 2l| and 2^, correspondingly). Besides, for comparison, the memory function of the 
coarse-grained text of "Pygmalion" by B. Shaw [2^ is presented in the same inset (dots), 
the power-law fitting is shown by solid line. 

It is interesting to note that the memory function of any text mimics the correlation 
function, as it was found for the model example Eq. ()16|). This fact is confirmed by Fig. El 
where the correlation function of the coarse-grained text of Bible is shown. One can see that 
its behavior at both short and long scales is similar to the memory function presented in 
Fig.m However, the exponents in the power-law approximations of K{r) and F{r) functions 
differ essentially. 
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FIG. 5: The correlation function K{r) for the coarse-grained text of Bible at short distances. 
In inset, the power-law decreasing portions of the K{r) plot for the same text. The solid line 
corresponds to power-law fitting of this function. 



Thus, we have demonstrated the efficiency of description of the symbolic sequences with 
long-range correlations in terms of the memory function. An equation connecting the mem- 
ory and correlation functions of the system under study is obtained. This equation allows 
reconstructing a memory function using a correlation function of the system. Actually, 
the memory function appears to be a suitable informative "visiting card" of any symbolic 
stochastic process. The effectiveness and robustness of the proposed method is demon- 
strated by simple model examples. Memory functions for some concrete examples of the 
coarse-grained literary texts are constructed and their power-law behavior at long distances 
is revealed. Thus, we have shown the complexity of organization of the literary texts in 
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contrast to a previously discussed simple power-law decrease of correlations 

Our theory describes not all statistical properties of the binary symbolic sequences. For 
example, our consideration cannot reflect the property of directivity of the texts since the 
theory is based on the examination of the correlation function that is even by definition. 
This property can be revealed only using the ternary (or of the higher order) correlation 
functions. The linguistic aspects of the problem also require a regular and systematic study. 

We have examined the simplest examples or random sequences, the dichotomic one. 
Nevertheless, our preliminary consideration suggests that the presented theory can by gen- 
eralized to the arbitrary additive Markov process with a finite or infinite number of states 
and with discrete or continuous "time". A study in this direction is in progress. 

The proposed approach can be used for the analysis of other correlated systems in different 
fields of science. 
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